Numpy and Simulations

Random numbers

An important part of any simulation is an element of randomness. If, for example, there are 10,000 possible things that can happen in your simulation (e.g. genes to mutate or molecules to interact) you want to understand the effects of as many of them as possible. One way to do that is to try them one at a time, starting with the first and going through the list. This might be impractical due to it taking too much time, computer memory, etc.

Another option is to "sample" from the possible options, e.g. just try out 100 of them. This will take less time but you must be careful how you choose which ones to try. A naïve approach would be to do the first 100 in the list, but this will very likely lead to a bias (perhaps all the genes controlling movement are near the beginning of the genome).

The best way to avoid these biases is to select your sample as randomly as possible.

numpy provides lots of very good tools for generating random numers, depending on exactly what you want to use them for.

In [1]:
from numpy.random import default_rng

rng = default_rng()

This rng is our "random number generator". Depending on what we want from it, it can provide randomness in lots of different ways.

The simplest is the random method it provides:

In [2]:
rng.random()
Out[2]:
0.8153786322096002

You will get a different number to what I have here and if you rerun the line of code, it will keep giving you different values. They will always be $0 \le r \lt 1.0$ and should sample over that range evenly.

As we saw with np.zeros and np.ones in the last chapter, you can request an array of random numbers by passing in a size argument. So to get three random numbers:

In [3]:
rng.random(size=3)
Out[3]:
array([0.78074909, 0.98814236, 0.24077928])

or even a $2\times4$ grid:

In [4]:
rng.random(size=(2, 4))
Out[4]:
array([[0.29462796, 0.88065677, 0.93041109, 0.26346213],
       [0.48620935, 0.8908907 , 0.75843885, 0.6226575 ]])

If you want integer values instead then there is an integers method which give you integers in the range from 0 to the value you specify:

In [5]:
rng.integers(20)
Out[5]:
19

Or again, asking for 10 of them:

In [6]:
rng.integers(20, size=10)
Out[6]:
array([ 3, 18,  4,  3, 12, 16, 19,  8,  1,  0])

You can also specify the lower bound explicitly, so to get numbers between 4 and 6:

In [7]:
rng.integers(low=4, high=6, size=10)
Out[7]:
array([5, 5, 5, 5, 5, 4, 5, 4, 5, 5])

You'll notice that you'll always get only 4 or 5. By default the integers method doesn't include the upper bound (just like with range()). You can tell it to include it by setting endpoint=True:

In [8]:
rng.integers(low=4, high=6, endpoint=True, size=10)
Out[8]:
array([5, 4, 4, 5, 6, 5, 4, 4, 5, 5])

Exercise 1

Generate an array with 3 rows and 5 columns, containing random integers which are $-5 < r <= 8$

Pseudo-randomness

Random number generators in computers work by implementing an algorithm. Algorithms like this are purely deterministic, that is, if you start with the same input and run them many times, they'll always give you the same output. This is because at their core they are just performing mathematical operations to numbers. For this reason, they are often referred to as "pseudorandom number generators".

We can see this action by explicitly setting the place from which the random number generator should start with the seed:

In [9]:
seeded_rng_1 = default_rng(seed=42)
seeded_rng_1.random(size=5)
Out[9]:
array([0.77395605, 0.43887844, 0.85859792, 0.69736803, 0.09417735])

If we make a second random number generator, but set the same seed, we will see exactly the same "random" numbers being produced:

In [10]:
seeded_rng_2 = default_rng(seed=42)
seeded_rng_2.random(size=5)
Out[10]:
array([0.77395605, 0.43887844, 0.85859792, 0.69736803, 0.09417735])

If we set the seed to a different value, then we will get a different sequence:

In [11]:
seeded_rng_3 = default_rng(seed=8)
seeded_rng_3.random(size=5)
Out[11]:
array([0.32697228, 0.98727684, 0.31871084, 0.78854894, 0.86989651])

If we do not set any seed, then the operating system will provide a random starting point for us.

Having a fixed seed is useful if we want random-type behaviour but also repeatability. This is particularly useful when debugging your code.